An Improved Random Forest Algorithm for Prediction of Protein-Protein Interaction
نویسنده
چکیده
Protein-protein interaction (PPI) is a combining two or more protein because of biochemical events in any living cell. Protein domains are functional and/or structure units in a protein and consequently they are responsible for protein-protein interaction. Many machine-learning approaches with domain-based models for protein interaction prediction and their feasibility are showed. In this study, we developed a domain-based predictor based on Random Forest (RF) algorithm with the CPRS method, it is based on cost proportional roulette sampling technique and create training sample in constructing ―forest‖. Moreover, the paper applied the theory ―A protein pair is interaction with each other when they are the same function and position‖. Experimental results on Saccharomyces cerevisiae dataset show that our protein–protein interactions predictor has higher than some model with sensitivity (81.7%) and specificity (73.6%). Keywords— Random Forest, Protein-protein interaction, Domain, Roulette Sampling Technique
منابع مشابه
Propensity based classification: Dehalogenase and non-dehalogenase enzymes
The present work was designed to classify and differentiate between the dehalogenase enzyme to non–dehalogenases (other hydrolases) by taking the amino acid propensity at the core, surface and both the parts. The data sets were made on an individual basis by selecting the 3D structures of protein available in the PDB (Protein Data Bank). The prediction of the core amino acid were predicted by I...
متن کاملPrediction of Protein Sub-Mitochondria Locations Using Protein Interaction Networks
Background: Prediction of the protein localization is among the most important issues in the bioinformatics that is used for the prediction of the proteins in the cells and organelles such as mitochondria. In this study, several machine learning algorithms are applied for the prediction of the intracellular protein locations. These algorithms use the features extracted from pro...
متن کاملPrediction of Protein-Protein Interaction Sites by Random Forest Algorithm with mRMR and IFS
Prediction of protein-protein interaction (PPI) sites is one of the most challenging problems in computational biology. Although great progress has been made by employing various machine learning approaches with numerous characteristic features, the problem is still far from being solved. In this study, we developed a novel predictor based on Random Forest (RF) algorithm with the Minimum Redund...
متن کاملDiscovering Domains Mediating Protein Interactions
Background: Protein-protein interactions do not provide any direct information regarding the domains within the proteins that mediate the interactions. The majority of proteins are multi domain proteins and the interaction between them is often defined by the pairs of their domains. Most of the former studies focus only on interacting domain pairs. However they do not consider the in...
متن کاملارزیابی صحت پیشبینی ژنومی در معماریهای مختلف ژنومی صفات کمی و آستانهای با جانهی دادههای ژنومی شبیهسازیشده، توسط روش جنگل تصادفی
Genomic selection is a promising challenge for discovering genetic variants influencing quantitative and threshold traits for improving the genetic gain and accuracy of genomic prediction in animal breeding. Since a proportion of genotypes are generally uncalled, therefore, prediction of genomic accuracy requires imputation of missing genotypes. The objectives of this study were (1) to quantify...
متن کامل